University of Konstanz - Using Visual Analytics to Uncover the Course of a Pandemic
VAST 2010 Challenge
Grand Challenge: Arms Dealing and Pandemic
Authors and Affiliations:
Peter Bak, University of Konstanz, bak@dbvis.inf.uni-konstanz.de [PRIMARY contact]
Christian Rohrdantz, University of Konstanz, christian.rohrdantz@uni-konstanz.de
Curran Kelleher, University of Konstanz, kelleher@dbvis.inf.uni-konstanz.de
Tool(s):
We used many freely available tools for analysis and visualization:
-
KNIME, a pipeline based data exploration platform developed at the University of Konstanz
-
Visone, a social network analysis package developed at the University of Konstanz
-
ManyEyes, a web based visualization platform developed by IBM
-
STRAP, a tool for working with multiple sequence alignments developed by Charité Humboldt-University Berlin
-
Dendroscope, a tool for visualizing phylogentic trees and rooted networks developed by the University of Tübingen
Several small programs were created by our students for task specific data preprocessing and visualization activities:
Video:
Link: Our submission video
ANSWERS:
Introduction
The 2010 VAST
Challenge data set has three parts: textual arms dealing intelligence,
daily hospitalization records across many countries, and a set of
genome sequences for the deadly Drafa Virus. In the current debrief, we
aim to synthesize the knowledge we have gleaned from all these data
sources into one detailed description of how we understand the
situation. For this purpose, we first target each of the data sources
separately, then use the collected pieces of information for describing
the larger story.
After detailed analysis of all data sources, we conclude that a series
of meetings, which took place in Dubai between April 15th and 21st, is
at the crux of the pandemic. The participants in these meetings, mostly
arms dealers, were infected by one of their accomplices and carried the
virus to a number of countries, causing a pandemic outbreak and the
death of millions. In the following we will present the collection of
facts leading to our description of the events and interpretations.
Further, we will define a set of actions that should be taken to answer
some of the open questions still remaining. The rest of the document is
organized by the three very specific tasks of the Grand Challenge,
starting with a brief overview of the general methodology applied for
all tasks.
Methodology
All the results presented are the outcome of student projects from a
course held at University of Konstanz dedicated to the VAST Challenge.
Small student groups were assigned to each mini challenge. Students had
to analyze the data provided and answer the tasks of their mini
challenge as part of the course requirements. The methodologies applied
for all mini challenges used a tight integration of automatic
information extraction with interactive visualization. The application
of visualizations was twofold: they were used to generate new
hypotheses as well as for exploring and confirming the output of
automated algorithms. In order to extract the required information from
the provided data, it was crucial to introduce visualization at all
stages of the analytic process, rather than to design more
sophisticated and innovative ways of visualizing the underlying data.
In conclusion, the contribution of the current work is in its
methodology; combining automatic algorithms and visualization, rather
than in the uniqueness of the techniques. Linkage between the arms
dealing activity and the pandemic outbreak
Analysis
We have identified a series of meetings in Dubai between many illegal
arms dealers as a key element in the development of the Drafa Virus
pandemic. A Russian arms dealer, Mikhail Dombrovski, had planned to
meet Dr. George Ngoki (referred to as “Dr. George”) on April 15th in
Dubai. Dr. George is from Nigeria, which was confirmed to be the origin
of the virus by our investigation of Mini Challenge 3. We believe the
virus was initially transferred from Dr. George to Dombrovski at their
meeting on the 15th, then subsequently spread through the arms dealing
network active in Dubai at that time. We assume that the people at the
meetings returned to their home countries afterwards, and that these
infections were the first of the pandemic in these countries.

Figure 1: A representation of planned meetings in Dubai between the
15th and 21st of April, 2009. Rows represent the participants in the
meetings (individual name and country of origin). Columns represent
days of April in which the meetings took place. Color represents
infection status.
Drawing from the remaining collection of meeting plans discovered in
the document collection, we attempted to construct the complete pathway
of people through which the virus spread. To construct this pathway we
used information
about planned meetings found in the intelligence documents (we made the
simplifying assumption that all meetings took place as planned). Figure
1 expresses this pathway visually. Each circle represents an appearance
of a person at a meeting. Red represents infection, green represents
non-infection, and orange represents likely infection. We are fairly
certain that after the initial transfer from Dr. George to Dombrovski,
Dombrovski infected Nicolai and Saleh Ahmed, Ahmed infected “Brother
Haik”, and Nicolai infected Igor. All of these people are known to be
part of an illegal arms dealing network.
We hypothesize that the participants from Lebanon and Kenya were also
infected at some point in Dubai, because those countries experienced
the viral pandemic. These people may have had undocumented meetings
with infected individuals in Dubai. However, it is also possible that
the virus was spread to these countries through other paths. We do not
have enough information to make a conclusion with certainty.

Figure 2: A visualization of a
phylogenetic tree computed from the viral sequence data. Distance from
the center corresponds roughly to edit distance of the sequence from
the original strain, Nigeria B. Color represents severity of the
symptoms caused by the strain.
In addition to the story in Dubai, phylogenetic analysis of the viral
sequence data reveals that Nicolai was a carrier of a strain which
evolved to become very deadly. Figure 2 visualizes the results of the
phylogenetic analysis (the tree structure), along with the severity of
each strain (color). The strain carried by Nicolai appears in the
figure as strain number 583. This strain itself is quite severe, and
its offspring strains (nodes branching from node 583 away from the
center of the figure) are very severe as well. This supports our
hypothesis that Nicolai was the one who infected others with the strain
which evolved into the most deadly variants of the virus, spreading
through arms dealers in Dubai and eventually causing the pandemic.

Figure 3: Temporal
patterns of
hospitalization and death among countries. Time is represented on the
x-axes, and the y-axes the normalized number of patients in each
country. Noise was removed by only considering symptoms caused by the
virus (symptoms which in general match the characteristic pandemic
temporal pattern). Color differentiates between hospitalizations and
deaths.
The hospital records were used to generate figure 3, which illustrates
the temporal characteristics of the pandemic across countries. Each
plot represents a country. The plots are ordered left to right and top
to bottom by date of peak hospitalization. We used the peaks as
indicators of temporal sequence, as onset data was very noisy.
According to this metric, the pandemic occurred first in Nairobi,
Kenya. This agrees with our aforementioned hypothesis that Dr. George,
from Nairobi, was a key link in spreading the disease to other
countries through the Dubai meetings, as outlined in Figure 1.
Based on our knowledge of the arms dealing meetings in Dubai, it is not
clear how the virus reached Kenya and Lebanon, the two first countries
experiencing the pandemic outbreak. The people identified as arms
dealers from these two countries first met with Nikolai in Dubai before
the 19th of April when Nicolai was likely infected by Dombrovski.
Therefore, we hypothize that there were additional, undocumented
meetings in Dubai between infected members of the arms dealing network
and those from Kenya and Lebanon, but for this we have no clear
evidence. We are however fairly confident that members of the arms
dealing network from Yemen, Suadi Arabia and Iran got infected by
Nikolai and Salah Ahmend in Dubai, then carried the disease back to
their countries of origin.

Figure 4: Geographic maps which show the countries involved in arms
dealing and the pandemic. In the map on the left, color represents an interestingness
metric for each country, computed from country name frequency
weighed by time (more recent occurences have higher weight) across all
intelligence documents. In the map on the right, countries affected by
the pandemic are colored by their mortality rate.
We mapped our previous
results from
both the document collection and the hospitalization records to a
geographic map for further analysis. These results are shown in Figure
4. The first interesting finding was that Thailand and Turkey are
included in the data on the pandemic outbreak, but do not show its
characteristic mortality rate. We can also observe in figure 3 that
hospitalizations and deaths in Thailand and Turkey do not exhibit the
characteristic pandemic curve over time. From this observation, we
conclude that the virus never reached these countries. This fact agrees
with our proposed path of viral spread during the Dubai meetings:
though arms dealers from Thailand and Turkey were indeed present in
some of the meetings (Hakan, Celik and Boonmee), they were never
exposed to an infected individual according to the facts we have
assembled and visualized in Figure 1.
Lessons Learned
Our visual analytics based approach was validated by the quality of our
results. Nevertheless, the advantages and drawbacks of our approach
should not be left undiscussed. As a guiding principle, we always aimed
at involving the human as early as possible in the analytic process.
This involvement was mostly supported by standard visualization
techniques. Also, we tried to support this process by automatic
analysis techniques. Their combination is, in our opinion, the most
successful approach. However, the data, task descriptions and
background information provided by the challenge committee were very
diverse, complex and multimodal, making the analytic process to a true
challenge. We experienced great difficulties in fitting the extracted
pieces of the puzzle together. We thereby often made simplifying
assumptions for the sake of convenience, some of thich which likely be
unacceptable in real-world scenarios. Our theories, though partially
supported by evidence and always logical, are not fully consistent. For
example, we could not explain how the arms dealers from Kenya (Owiti
and Otieno) got infected and carried the virus to Kenya. There is also
no clear evidence what the participants of the meeting in Dubai did
after the meeting; we assume that they went back to their home
countries but there is no direct evidence for this. It also remains a
question, who actually did participate at the meetings. We only know
that they were planned. Further intelligence should be gathered about
the meeting itself. It is also not supported by the intelligence, how
the virus is transmitted. The medical reports should have provided this
piece of information. The most problematic part of our assumption, is
that we are unclear about the 6 circumstances of Dr. Ngaki (aka Dr.
George), what kind of “medical project” he was working on, and how he
got infected in the first place.